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ABSTRACT 

The Split Ends (SPEN) protein was originally discov- 
ered in Drosophila in the late 1990s. Since then, ho- 
mologous proteins have been identified in eukaryotic 
species ranging from plants to humans. Every family 
member contains three predicted RNA recognition 
motifs (RRMs) in the N-terminal region of the pro- 
tein. We have determined the crystal structure of the 
region of the human SPEN homolog that contains 
these RRMs — the SMRT/HDAC1 Associated Repres- 
sor Protein (SHARP), at 2.0 A resolution. SHARP is 
a co-regulator of the nuclear receptors. We demon- 
strate that two of the three RRMs, namely RRM3 and 
RRM4, interact via a highly conserved interface. Fur- 
thermore, we show that the RRM3-RRM4 block is 
the main platform mediating the stable association 
with the H12-H13 substructure found in the steroid 
receptor RNA activator (SRA), a long, non-coding 
RNA previously shown to play a crucial role in nu- 
clear receptor transcriptional regulation. We deter- 
mine that SHARP association with SRA relies on 
both single- and double-stranded RNA sequences. 
The crystal structure of the SHARP-RRM fragment, 
together with the associated RNA-binding studies, 
extend the repertoire of nucleic acid binding prop- 
erties of RRM domains suggesting a new hypothesis 
for a better understanding of SPEN protein functions. 

INTRODUCTION 

The Split Ends (SPEN) gene was discovered in the mid- 
1990s through genetic studies linked to homeotic pheno- 
types in Drosophila (1,2). The severe developmental prob- 



lems observed in knockout animals demonstrated its essen- 
tial role (3,4). The rat homolog, called Msx2-Interacting 
Nuclear Target (MINT), was later identified independently 
in a screening for interacting homeoprotein during osteo- 
genesis (5). MINT was shown to localize in the nucleus, 
and, in vitro, its predicted RNA recognition motifs (RRMs) 
were shown to interact with the osteocalcin (OC) prox- 
imal promoter region as well as with several homopoly- 
meric DNA sequences (5). MINT was also proposed as 
a transcription repressor of the OC promoter. Multiple 
SPEN/ MINT protein functions have been systematically 
reported as being associated with negative effects on tran- 
scription (6-9). The capacity of SPEN proteins to recruit 
transcriptional repressors was later shown to be due to 
a conserved SPEN paralog and ortholog C-terminal do- 
main (SPOC). This domain arbitrates the interaction with 
the silencing mediator for retinoid and thyroid-hormone 
repressor protein (SMRT/NCoR) as well as with the his- 
tone deacetylase HDAC1 (10-13). The human homolog of 
the SPEN protein was renamed SMRT/HDAC1 Associated 
Repressor Protein or SHARP (we use this name through- 
out the manuscript; 13). The structure of the SPOC domain 
of SHARP has been previously determined, thus, provid- 
ing important molecular details regarding the recruitment 
of SMRT by the SPOC domain (10). SHARP has also a 
proven role in Notch signalling, where SHARP and Notch- 
1-IC seem to have an exclusive binding behaviour with their 
partner RBP-Jk (12). The region responsible for the stable 
recruitment of SHARP to RBP-Jk is located in the central 
part of the protein, between residues 2804 and 2816 (12). 
Simultaneously, several independent studies demonstrated 
the role of SHARP in nuclear receptor-mediated transcrip- 
tional responses (13-15). The three predicted RRMs in 
the N-terminal region of SHARP were shown to mediate 
its negative transcriptional activity in the nuclear recep- 
tor pathway, both in vitro and in vivo (13,15). These effects 
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were shown to occur via its association with the non-coding 
RNA produced by the steroid receptor activator gene (SRA 
RNA; 13,15). A particular region containing the H12-H13 
substructure of SRA RNA was shown to be sufficient to me- 
diate SHARP association (15). RRMs are the most abun- 
dant RNA-binding domains (RBDs) present in vertebrates 
(they have been found in 0.5%— 1% of human genes; 16). 
Interestingly, biochemical and structural studies of RRMs 
have generally shown that every RRM plays its own specific 
role in cellular functions; this is in spite of their structural 
similarities (17,18). 

We have determined the crystal structure of the three 
RRMs present in the N-terminal part of SHARP The 
atomic model revealed domain architecture where RRM3 
and RRM4 form a platform, with RRM2 being linked flex- 
ibly. The residues responsible for the interaction between 
RRM 3 and RRM4 are highly conserved throughout the 
SPEN family. Moreover, while the RRM3 has the consen- 
sus amino acids for single-stranded RNA association, the 
RNA-binding surface of RRM4 is blocked by an a-helix 
located immediately downstream of the RRM fold — a situ- 
ation reminiscent of the newly defined xRRM present in the 
LARP protein (19). The xRRMs have the atypical proper- 
ties of binding base-paired RNA sequences. We then char- 
acterized the association of the RRMs of SHARP with the 
H12-H13 RNA. Point mutations in the RRM3 or deletion 
of the RRM4 strongly destabilize the interaction with the 
H12-H13 fragment. The RRM3/RRM4 platform is there- 
fore crucial for the formation of a stable complex with the 
H12-H13 region of the SRA RNA. We suggest that the as- 
sociation of SHARP with the H12-H13 RNA sequence is 
specific and requires stable stem loops including base-paired 
sequences. Our structural and biochemical data highlight 
the unexpected properties of the SHARP RRMs, which 
bring a new layer of complexity in the RNA recognition 
mode of proteins containing multiple RRMs. 



MATERIALS AND METHODS 

Molecular Biology 

The nucleotide sequence encoding residues 335 to 620 of 
the human SHARP was obtained by gene synthesis (En- 
telechon) and named R2-3^h. The sequence was codon- 
optimized for protein expression in Escherichia coli (E. coli) 
and it was then subcloned in a pET-based vector contain- 
ing N-terminal 9XHistidine and Glutathione S-Transferase 
tags followed by the Tobacco Etch Virus (TEV) protease 
cleavage site. The truncation of SHARP-RRM correspond- 
ing to the constructs R3 (residues 436-512), R2-3 (residues 
336 to 513), and R3-4h (residues 436 to 620) were cloned 
in the same expression plasmid and produce using the same 
protocol. The point mutants (R3mut and R2mut-R3mut) 
were obtained by site-directed mutagenesis of the plasmid 
encoding R2-3-4h using a Quick-Change mutagenesis kit 
(Stratagene) according to the manufacturer's instructions. 
The R3mut construct contains the following mutations: 
F441A, K470A, Y478A, F480A, and K512A. The R2mut- 
R3mut construct contains the same mutations plus K338A, 
Q369A, H371A, E376A, L380A, and F382A. All constructs 
were verified using DNA sequencing prior to large-scale ex- 
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pression. Primer sequences used for the truncations or the 
mutations are provided in Supplementary Table SI. nullnull 



Protein expression and purification 

All the proteins tested in the subsequent experiments were 
produced using the same protocol. Briefly, large cultures 
of E. coli BL21 Star cells, previously transformed with a 
given SHARP construct, were grown in Terrific Broth me- 
dia (1.2% peptone, 2.4% yeast extract, 72 mM K 2 HP0 4 , 
17 mM KH 2 P0 4 , and 0.4% glycerol) at 37°C for 4 h. This 
was followed by overnight induction at 18°C with 0.25 
mM isopropyl-p-D-thiogalactopyranoside. Cells were har- 
vested using centrifugation at 5200 rcf and resuspended 
in 50 mM HEPES buffer (pH 7.5) containing 300 mM 
NaCl, 20 mM imidazole, 0.1% X-Triton 100, DNase 1 (1 
fxg/ml), Lysozyme (1 |xg/ml), 5 mM (3-mercaptoethanol, 
and a cocktail of protease inhibitors (PhenylMethylSulfonyl 
Fluoride 1 mM, leupeptin 1 |xg/ml, and pepstatin 2 |xg/ml). 
Cells were lysed using an Emulsiflex system (Avestin) and 
cleared using centrifugation at 33,000 rcf for 30 min at 
4°C. The soluble fraction was subjected to an initial affin- 
ity purification using a chelating HiTrap FF crude column 
(GE Healthcare) charged with Ni 2+ ions. The protein was 
eluted with 250 mM imidazole and desalted against 50 mM 
HEPES (pH 7.5), 300 mM NaCl, 20 mM imidazole, and 5 
mM (3-mercaptoethanol. The TEV protease was added at 
a ratio of 1/50 (TEV/protein), and the sample was incu- 
bated at 4°C for 3 h. The cleaved protein was separated from 
the tags, the protease, and the contaminants by a second 
affinity purification on a chelating HiTrap FF crude column 
charged with Ni 2+ ions. The flow- through was collected and 
directly loaded on a Heparin column (GE Healthcare) to re- 
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move the remaining contaminants. Elution was performed 
with a linear salt gradient between 0.05 and 1 M of NaCl. 
The fractions containing the elution peak were analysed on 
a SDS-PAGE and pooled. The protein was further cleaned 
using gel filtration chromatography with a Superdex 75 col- 
umn (GE Healthcare) in 20 mM HEPES (pH 7.5) contain- 
ing 150 mM NaCl, 1 mM (3-mercaptoethanol, 5% (v/v) 
glycerol, and 5 mM MgCl2. The protein eluted from the 
gel filtration column as monomers and was concentrated 
(MWCO 30 KDa; Amicon) to 10 mg/ml. The concentrated 
protein was used immediately for crystallization or stored 
at — 80° C. All protein samples were more than 95% pure as 
judged by Coomassie-stained SDS-PAGE gels. 

Crystallization 

The SHARP construct, encompassing residues 335 to 620, 
was crystallized at a concentration of 10 mg/ml (0.3 mM) 
using the sitting-drop vapour diffusion technique at 4°C, 
mixing an equal volume of the protein and the reservoir 
solution. Initial screening with nanolitre volumes of drops 
gave one crystal in one condition (Natrix screen, Hampton 
research); this was further optimized to obtain diffraction 
quality crystals. Final crystals were grown from a mixture 
of 0.5 |xl of protein with 0.5 |xl of mother liquor (0. 1 M am- 
monium sulphate, 0.01 M magnesium chloride hexahydrate, 
0.05M MES monohydrate pH 5.6, and 16% w/v polyethy- 
lene glycol 8000). Prior to flash freezing in liquid nitrogen, 
crystals were put into perffuoropolyether CryoOil (Hamp- 
ton Research). 

Data collection, structure determination, and refinement 

Diffraction data were collected at the European Syn- 
chrotron Radiation Facility (ESRF, Grenoble) and pro- 
cessed using XDS (20). Data collection statistics are shown 
in Table 1. The structure of R2-3-4h was determined 
by molecular replacement using the PHASER programme 
from the CCP4 package (21). We used three ensembles, each 
one containing several RRM structures as models: Ensem- 
ble 1 (PDB codes: 3MD1, 2ADC, 2DNM, 2DGU, 2CQB), 
Ensemble 2 (PDB codes: X4AR, 2YTC, 4F26, 1WHY, 
2CPZ), and Ensemble 3 (PDB codes: 2138, 1WHY, 1x55, 
2LCW, 2CPE). Because the quality of the original densities 
was not good enough to allow reconstruction of our protein 
model, we performed several cycles of density modification 
using the SOLOMON programme (22); this gave a read- 
ily interpretable electron density map. The final model was 
built using the COOT molecular graphics programme (23) 
and refined using routine phenix. refine from the PHENIX 
programme (24). We included TLS refinement using the 
following groups: groupl, amino acids 335^111; group 2, 
amino acids 412^29; group 3, amino acids 430-513; group 
4, amino acids 514-588; and group 5, amino acids 589-620 
(25). The coordinates and the structure factors have been 
deposited in the PDB with accession code 4P6Q. 

Electrophoretic mobility shift assay (EMS A) 

Radiolabeled RNAs were synthesized in vitro using the T7 
RNA polymerase (home made) in the presence of a 32 P- 



UTP. The transcription reactions were performed as pre- 
viously described (26). We used various DNA templates to 
produce the RNA sequences used in our EMSA. The pu- 
rified and radiolabeled RNAs (0.1 pmol) were incubated 
with increasing concentrations of the different SHARP con- 
structs for 20 min at room temperature in 20 mM HEPES 
(pH 7.5) containing 150 mM NaCl, 5 mM MgCl 2 , and 
0.5 U of Superase (Invitrogen). Samples were then mixed 
with native loading dye (50% glycerol, 0.2% bromophenol, 
and 0.2% xylene cyanol) and loaded on a 5% native poly- 
acrylamide gel. Gels were run in standard IX Tris-borate- 
EDTA buffer at 200 V for 2 h at 4°C and visualized using 
autoradiography. 

For the competition assays, the cold RNAs correspond- 
ing to the H12-H13, H12, HI 3, and H7 sequences were 
synthesized in vitro using T7 RNA polymerase as indicated 
above. Purified RNAs were annealed at low concentration 
by heating them at 90° C for 5 min followed by a slow reduc- 
tion of temperature to 30°C over an extra hour, and were 
subsequently kept on ice. The short single-stranded RNAs 
LA, LB, LC, UL, and USL were ordered (Thermo Scien- 
tific, Dharmacon), deprotected and resuspended in water 
before annealing as recommended by the manufacturers. 
The R2-3^4h/H12-H13 complex was prepared as described 
above. We then added the cold RNAs at different concen- 
trations and incubated them for an additional 10 min. The 
samples were subsequently analysed on a 5% native gel as 
previously. 

RESULTS 

Structure of the SHARP-RRM fragment 

The SHARP protein has three RRMs located in its N- 
terminal region. The same protein fragment has previously 
been shown to bind to and modulate the transcriptional 
activity induced by the SRA RNA (13). We determined 
the atomic structure of the protein construct encompass- 
ing residues 335 to 620 (SHARP-RRM or R2-3-4h) at a 
resolution of 2.0 A. The model contains the three predicted 
RBDs plus two helices at the C-terminus (Figure 1 A and B; 
see statistics in Table 1). The crystal structure revealed that 
the predicted RBDs fold into three individual RRMs with 
the typical (3a(3(3aP topology. These RBDs were named 
RRM2, RRM3, and RRM4 as the N-terminal region of 
SHARP was predicted to have an additional RRM (en- 
compassing residues 6 to 81). The SHARP protein frag- 
ment is organized in two blocks, made of the RRM2 on 
one hand and the RRM3-RRM4 on the other. A flexible 
linker connects these two blocks (Figure IB). Furthermore, 
the C-terminal tail folds into two consecutive a-helices im- 
mediately after the RRM4 (Figure 1C and D). The first of 
these two helices interacts extensively with the RRM4 (3- 
sheet surface, thereby covering the putative RNA-binding 
surface of this RRM (Figure 1C and D). The presence of a 
helical extension immediately following the predicted RBDs 
has been observed multiple times in other RRM-containing 
proteins (27-29). In a few cases, specific functions have been 
attributed to these C-terminal helices. For instance, the C- 
terminal helix C in the U1A protein is shown to reorient 
upon RNA association (30,31), and the helical extension 
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Figure 1. Overall structure of SHARP-RRMs. (A) Schematic representation of the domain organization of the SHARP protein. (B) The overall structure 
of R2-3-4h is shown: RRM2 is coloured in yellow, RRM3 in orange, RRM4 in red, C-terminal helices in grey, and linker regions are coloured in green. 
(C) and (D) Top (C) and side (D) views of the RRM4 plus the C-terminal a-helix. The protein is shown as a cartoon and coloured in red for the RRM 
and grey for the helix. (E) Superposition of various RRMs followed by C-terminal helices. The xRRM found in the p65 protein is coloured in green (PDB 
code 4EYT). The qRRM 1 and 2 from the hnRNP F in complex with the AGGGAU hexa-ribonucleotide are coloured in gold and magenta, respectively. 
The nucleotides are colored according to atom types (carbon: yellow; nitrogen: blue; oxygen: red; and phosphate: orange). SHARP is coloured in red with 
the helix coloured in grey. Proteins are shown as a cartoon and residues involved in intramolecular interactions in panels C and D are shown as sticks and 
coloured according to atom type (carbon: cyan; nitrogen: blue; and oxygen: red). 
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found in the splicing factor protein pl4 is responsible for 
the tight association with the splicing factor SF3M55 (27). 
The hnRNP F protein contains three copies of the qRRM 
domains which have their (3 -sheet surfaces partially covered 
by a C-terminal helix (32). It was further shown that qR- 
RMs bind their RNA target using the loop regions rather 
than (3-sheet surface (29). Another example is found in the 
recently described xRRM present in the p65 protein (19). 
This atypical type of RRM is present in members of the La 
protein family and is known to interact with specific sites 
of the telomerase RNA (33). The C-terminal helix found 
in the xRRM makes it capable of establishing a combi- 
nation of single- and double-strand-mediated interactions 
with its RNA target (19). Structural superimposition of 
SHARP RRM4 including the C-terminal helix onto indi- 
vidual RRM domains from the hnRNP F or the p65 pro- 
teins clearly shows the conserved location of the additional 
secondary structure element found in the SHARP protein 
(Figure IE). With the notable exception of U1A, every the 
RRMs that are followed by a C-terminal helix are unable 
to bind RNA through the (3 -sheet surface, suggesting that 
RRM4 could follow this trend (Figure ID and E). 

Comparison of the predicted domain organization of an- 
notated SPEN protein family members, ranging from mam- 
malian to plant species, underlines the conserved organi- 
zation of the N-terminal region within this family (Sup- 
plementary Figure SI). This conserved region can be di- 
vided in two modules: one which contains an RRM (RRM2 
in SHARP) and a second module containing two RRMs 
(RRM3 and RRM4 in SHARP). These two blocks are con- 
nected through a relatively long linker of ~20-30 amino 
acids (Supplementary Figure SI). Interestingly, the plant 
proteins belonging to the SPEN family have an atypical 
primary sequence organization. In these species, the RRM 
corresponding to the human RRM2 is located after the 
RRM3/RRM4 block (Supplementary Figure SI). We sug- 
gest that, in the plant clade, the RRM2 coding sequence has 
been repositioned after the RRM3/RRM4-coding block 
at the primary sequence level. Such domain reorganiza- 
tion is not unusual in eukaryotic multidomain proteins, 
and may be due to the development of an alternative splic- 
ing mechanism (34,35). We used the Consurf server to 
map the sequence conservation onto the atomic surface 
of SHARP-RRM (36). This analysis clearly identified the 
RRM3/RRM4 interface as containing multiple residues 
highly conserved across eukaryotic species, with the plant 
proteins being the least conserved (Figure 2A, B and C; 
Supplementary Figure S2). This interface relies on amino 
acids from RRM3 and RRM4 domains and in particular 
residues R438, Q482, W520, and D585 (Figure 2D). Other 
residues connecting RRM2 to RRM3, RRM3 to RRM4 
and to the C-terminal a-helix following RRM4 are also part 
of the interface (Figure 2D). Furthermore, the particular 
orientation observed between RRM3 and RRM4 appears 
to be unique when compared with other protein structures 
containing multiple RRMs (Supplementary Figure S3). 
Clearly, the respective orientation of RRM3 and RRM4 
as well as the complex surface of interaction observed in 
our atomic model has not been reported before. Therefore, 
the presently described architecture of RRM3/RRM4 doc- 
uments a new mode of interaction between RRMs. 




Figure 2. RRM3-RRM4 interface conservation. (A) RRM3 and RRM4 
(shown as cartoon and coloured in orange and red, respectively) are in 
close contact. Bottom part shows the same protein fragment using a sur- 
face representation coloured according to surface conservation (cyan: low 
conservation and magenta: high conservation). (B) and (C) The RRM 3- 
RRM4 interface is facing the reader, and the protein surface is coloured as 
in panel A. (D) Magnified view of the RRM3-RRM4 interface of SHARP: 
residues located at the interface are shown as sticks and coloured according 
to atom types (carbon: slate or cyan; nitrogen: blue; and oxygen: red). 



To further validate the proposed modular organization 
of the SHARP N-terminal region consisting of the RRM2 
loosely attached to the RRM3/RRM4 unit, we performed 
small angle X-ray scattering (SAXS) experiments on the 
SHARP-RRM fragments R2-3-4h and R3-4h (Supple- 
mentary Figure S4). The SAXS data are quite well fitted 
using either the atomic models presently obtained or using 
ab initio procedure (Supplementary Figure S4A). Docking 
the atomic structures corresponding to RRM2 and R3-4h 
fragments connected by a flexible linker into the envelope 
calculated with the R2-3-4h data is possible with a slight 
reorientation of RRM2 as expected from its loose attach- 
ment with the RRM3/RRM4 unit (Supplementary Figure 
S4B). The R3-4h fragment of the structure can be fitted 
into both envelopes without any changes, strongly suggest- 
ing that RRM3 and RRM4 domains form a stable unit even 
in the absence of RRM2 (Supplementary Figure S4B and 
C). We further suggest that this structural design likely re- 
flects its importance for the SPEN protein function, most 
probably by governing their atypical RNA-binding proper- 
ties (see below). 
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Identification of the RNA-binding properties of the SHARP- 
RRM 

Using gel shift assays, we then investigated how the 
SHARP-RRM fragment could recognize nucleic acids. 
Typically, three aromatic side chains located on (3-strands 
1 and 3 are essential for RNA binding in canonical RRMs 
(for a review, see 17,37). In the SHARP protein, RRM2 
(partially), RRM3 and RRM4 exhibit aromatic residues at 
the standard positions within their folded domains (Fig- 
ure 3). However, the C-terminal helix of RRM4 is covering 
these conserved residues through multiple hydrophobic in- 
teractions (Figure 2D). The strong association of the a-helix 
with the (3 -sheet in RRM4 likely prevent RNA association 
at this location (Figures 2D and 3C). Moreover, the elec- 
trostatic surface potential of the SHARP-RRM fragment 
clearly indicate the existence of positively charged regions 
near the (3-sheet of RRM3 as well as in the loop connect- 
ing RRM3 to RRM4 and in RRM4 (Supplementary Figure 
S5). To further understand how SHARP-RRM could inter- 
act with RNA, we measured RNA association with multiple 
SHARP constructs. 

SHARP has two reported nucleic acid targets, the OC 
promoter region and the H12-H13 substructure found in 
the long non-coding SRA RNA (5,13,15). The later is an 
85 nucleotide-long RNA element highly conserved which 
adopts a stable secondary structure (38). Based on our 
atomic model, we designed different protein mutants or 
truncations and tested their ability to bind to the H12-H13 
RNA fragment (Figure 4). We were able to confirm that our 
crystallized construct was binding to the SRA RNA HI 2- 
H13 sequence with an apparent affinity constant in the low 
micromolar range (K^ app = 1 .75 |xM; Figure 4A and B). As 
the three RRMs can potentially bind nucleic acid molecules, 
we subsequently tested several constructions correspond- 
ing to different combinations of RRMs. We started with 
the construct R3^Ui, where the RRM2 was deleted while 
keeping the RRM3-RRM4 block and the C-terminal he- 
lices intact (Figure 4C). This construct is able to recog- 
nize the H12-H13 RNA fragment with a similar affinity, 
clearly indicating the RRM2 is not essential for the stable 
recognition of the SRA RNA (Figure 4C). We then mu- 
tated several residues of the RRM3 predicted to be impor- 
tant for RNA binding: (i) three aromatic residues found in 
the RNP1 and RNP2 motives (F441, Y478, and F480). (ii) 
Two lysine residues found in the (32-(33 loop and in the 
linker between RRM3 and RRM4 (K470 and K512, re- 
spectively). We verified that the construct with the RRM3 
domain mutated (R3mut) was still stable and folded by gel 
filtration chromatography and circular dichroism (Supple- 
mentary Figures S6 and S7). Both measurements strongly 
suggested that the mutant is well folded (Supplementary 
Figures S6 and S7). Though folded, the R3mut construct 
was unable to form a stable complex with the H12-H13 
RNA fragment, indicating that the mutated residues within 
RRM3 are important for RNA association (Figure 2D). 
We then tested RRM4 and the C-terminal helices. Unfortu- 
nately, the deletion of the C-terminal helices makes the pro- 
tein fragment highly insoluble (data not shown). We thus 
decided to delete the entire region containing the RRM4 
and the helices (R2-3 construct), which resulted in a sta- 



ble and folded polypeptide (Supplementary Figures S6 and 
S7). We further expressed RRM3 alone and showed that it 
also behaved as a stable folded domain (Supplementary Fig- 
ures S6 and S7). EMS A experiments with the R2-3 protein 
revealed that the RNA-binding capacity of this construct 
was impaired, indicating that the RRM4 domain and the 
helices also play a crucial role in RNA association (Figure 
4E). Then, we tested if RRM4 could bind the RNA on its 
own by mutating the canonical residues for RNA binding 
in both RRM2 and RRM3 (R2mutR3mut construct, mu- 
tated residues are listed in the Materials and Methods sec- 
tion). The RNA association was strongly reduced to a back- 
ground level, indicating that RRM4 and the C-terminal he- 
lices cannot bind stably the RNA on their own (Figure 4F). 

As we show above, RRM3 is tightly associated with 
RRM4 in our crystal structure and this association appears 
conserved within the SPEN family. These RNA-binding ex- 
periments strongly suggest that the RNA-binding proper- 
ties of SHARP are mostly due to the newly characterized 
RRM3/RRM4 block (Figures 2B and 4C). Although the 
exact role of each RRM is not yet fully apprehended, we 
propose that there is a strong cooperative effect between 
the two RRMs based on our atomic structure and on the 
RNA-binding experiments. The precise positioning of the 
two RRMs in space through their specific surface of interac- 
tion leads to high affinity binding properties, as previously 
observed for other multi-RRM-containing protein, such as 
PTB (39,40). 

The RNA structural context is important for SHARP- 
RRM/SRA interaction 

To further understand the RNA-binding capacities of 
SHARP, we investigated the RNA sequence driving the as- 
sociation, i.e. the 85 nucleotide-long H12-H13 fragment of 
SRA. We measured the ability of different SRA RNA frag- 
ments to displace the preformed R2-3^h/H12-H13 com- 
plex (Figure 5A). We first tested the full-length H12-H13, 
which competed very efficiently, while a single-stranded 
A15 oligomer did not displace the preformed SHARP- 
RRM/SRA RNA complex (Figure 5B, lanes 5-7 and 8- 
10). Most RRM domains have been shown to bind to short 
unstructured ssRNAs (41,42). The CISBP-RNA server 
was used to predict the best RNA-binding motif for the 
RRM3 (43). The best RNA-binding sequence predicted 
is GUGUG, and the second best predicted sequence is 
ACACA. Based on these two predictions, an RNA se- 
quence with an alternative purine-pyrimidine sequence is 
likely to bind to RRM3. Such a consensus sequence is 
partially found in the LA loop. We tested three individ- 
ual loop regions of H12-H13, i.e. LA, LB, and LC region, 
which did not displace the preformed complex even when 
added in 100-fold molar excess (Figure 5B, lanes 11-19). 
This indicates that the SHARP association with RNA is 
unlikely to be driven by non-specific interaction mediated 
by charged residues, as observed with proteins bound non- 
specifically to RNA (44,45). These experiments with indi- 
vidual loops also suggested that the RNA sequence respon- 
sible for the stable and specific association with SHARP- 
RRMs is not be restricted to single-stranded regions. We 
then split the H12-H13 RNA into two parts, namely H12 
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Figure 3. Individual RRM structures and the consensus amino acids for RNA binding. (A), (B), and (C) RRM2, RRM3, and RRM4 are shown as a 
cartoon and coloured in yellow, orange, and red, respectively. Amino acids at the canonical position for RNA interaction are shown as sticks and coloured 
as in Figure 1 . 



and HI 3, and performed similar experiments in an attempt 
to narrow down our search for the specific determinant be- 
hind SHARP/RNA recognition (Figure 5C). Both the H12 
and the HI 3 RNA fragments were able to displace the pre- 
formed complex, although they required larger molar ex- 
cess than the full H12-H13 fragment (Figure 5C, lanes 7- 
10 and 11-14). This result indicated that the LA but also 
the LB or LC sequences, when embedded in stem-loop 
structures, were efficiently binding to SHARR In order to 
test for the possibility that the specific SHARP/RNA in- 
teractions were due to a folded H12-H13 RNA, we per- 
formed competition experiments using three unrelated se- 
quences with diverse primary and secondary structures. We 
chose another fragment of SRA (H7) as a further exam- 
ple of RNA with multiple predicted stem-loop secondary 
structures (38). We used a 17 nt ssRNA sequence (UL), 
which has multiple purine-pyrimidine dinucleotides. The 
third RNA corresponded to the same UL sequence plus 
a short, well-characterized stem loop (USL). We observed 
that the H7 and USL fragments competed with the pre- 
formed protein/RNA complex, while the ssRNA sequence 
UL did not compete for the R2-3^h association (Figure 
5D). This experiment clearly indicated that secondary struc- 
tures within the RNA are necessary for SHARP/RNA in- 
teraction. Notably, the H7 and the USL sequences needed 
to be added in larger molar excess than the H12-H13 frag- 
ment to achieve the same competitive effect, indicating that 
SHARP-RRM/H12-H13 association is specific (compare 
lanes 5, 9, and 17 in Figure 5D). Examined together, these 
experiments point towards a complex mode of association 
between SHARP-RRMs and the H12-H13 fragment: a 
mode which is not limited to an association mediated by 
a 3^ nucleotide single-stranded sequence likely located in 
the loop LA and the (3-sheet surface of RRM3. The H12- 
H 1 3 secondary structure was recently shown to be identical 



both in isolation and within the full-length SRA RNA (46). 
It seems therefore unlikely that SHARP-RRM recognizes 
a particular tertiary fold adopted by the H12-H13 frag- 
ment. We also tested the competing capacities of the single- 
stranded and duplex DNA sequences found in the OC pro- 
moter region, previously identified as a specific binding site 
for the SHARP protein (5). Even though SHARP is able 
to bind to the OC promoter ssDNA or duplex DNA (Sup- 
plementary Figure S8), none of the DNA sequences dis- 
placed the preformed protein-RNA complex in our compe- 
tition assays (Supplementary Figure S8). This further vali- 
dates the binding specificity of SHARP-RRMs for the SRA 
RNA H12-H13 element. In an ultimate effort to identify 
the contact points between SHARP and the SRA RNA, 
we decided to crosslink the H12-H13 SRA RNA to the 
R3^1h protein construct. After UV irradiation of the com- 
plex, we isolated and primer extended the crosslinked RNA 
(Supplementary Figure S9). Several weak crosslinks were 
observed in the single-stranded regions of the H12-H13 
RNA (Supplementary Figure S9) reinforcing the idea that 
SHARP has multiple contact points within the RNA. The 
major crosslinking events happened at the dinucleotides 
UG of the loop LA (positions 36-37 and 39-40 of the H12- 
H13 element; Supplementary Figure S9), which corrobo- 
rate the predicted binding site for the RRM 3, i.e. alternating 
pyrimidine/purine sequence. 



DISCUSSION 

The SHARP protein is an important regulator of various 
transcriptional processes such as nuclear receptor-mediated 
responses, Notch-mediated transcriptional activation, or 
osteoblast differentiation (5,12,13). SHARP belongs to a 
protein family characterized by a SPOC domain at the C- 
terminus, and at least three RBDs at the N-terminus (Fig- 
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Figure 4. Identification of the protein region responsible for the SRA RNA association. (A) Representative EMSA used to quantify R2-3-4h construct 
association with the SRA RNA fragment H12-H13. (B) Quantification of the EMSA experiment shown in panel A. The bound fraction was quantified 
and analysed using the Hill Equation, and the fit is shown as a solid line. The calculated dissociation constant (K^, a pp) and the quality of the fit are 
indicated. (C) EMSA autoradiogram obtained when measuring the RNA-binding capacity of the R3-R4h construct. (D) EMSA autoradiogram showing 
the very weak RNA-binding activity of the R3mut construct. (E) EMSA autoradiogram showing the poor RNA-binding activity of the R2-3 construct. 
(F) EMSA autoradiogram obtained using the R2mutR3mut construct and indicating that the RRM4 plus the two helices participate only marginally in 
RNA association. 



ure 3 A). It is a large multidomain protein of ~400 kDa, 
which renders its structural characterization very difficult. 
We have solved the structure of the N-terminal protein frag- 
ment containing three RRM domains, enhancing our un- 
derstanding of proteins containing RRM that are associ- 
ated with negative transcriptional regulation (47^9). 

With the increasing amount of structural information 
on multidomain proteins, it has become clear that pro- 
teins with multiple RRMs generally adopt unique architec- 
tures; this helps them achieve higher binding affinity and 
specificity for their RNA targets (50). One central ques- 
tion associated with such proteins is the functional rele- 
vance of these interdomain contacts. We identified an unex- 
pected and stable interaction between RRM3 and RRM4. 
The surface of interaction found between these two pro- 
tein subdomains is highly conserved throughout evolution. 
The RRM3 and RRM4 domain association is stable as ob- 
served by SAXS measurement (Supplementary Figure S4). 
Furthermore, we show that both RRMs are needed in or- 
der to achieve a stable RNA association (Figure 4C-F). 
The RRM 3 domain contains the canonical residues allow- 



ing this domain to bind RNA. However, the association 
of the SHARP-RRM fragment is not limited to the short 
single-stranded RNA-binding region present in the RRM3 
as none of the single-stranded loops present in the HI 2- 
H13 sequence can compete efficiently with the binding of 
the full-length RNA fragment. We revealed that the pres- 
ence of double-stranded RNA sequences is important to 
displace the preformed complex between SHARP-RRM 
and the SRA RNA. The p-sheet surface of RRM4 strongly 
interacts with an a-helix a situation reminiscent to the one 
encountered in various other RRMs, some of them shown 
to bind a double-stranded region within their RNA tar- 
gets (19). We propose that the RRM3/RRM4 unit con- 
tributes to the overall RNA-binding affinity of SHARP, 
with a preference for alternating UG sequence located in 
structured loop (stem loop). We also show that SHARP 
affinity is higher for ribonucleotide sequences as ssDNA or 
duplex DNA sequences previously shown to interact with 
SHARP in vitro (5) were not able to compete efficiently for 
the binding (Supplementary Figure S8). Our structural and 
biochemical studies unravelled a new architecture for a pro- 
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Figure 5. SHARP-RRM association with RNA relies on stem-loop structure. (A) Secondary structures and sequences of the RNA molecules used for 
the competition assays reported in panels B, C, and D. The structural architecture of the H12-H13 fragment is based on that established by Novikova et 
al. (38). The loop regions and the single-strand RNA are coloured in blue (LA), yellow (LB), red (LC), and green (UL). (B) RNA-binding competition 
assays to determine the loop sequences mediating association between SHARP-RRM (R2-3-4h construct) and the SRA RNA H12-H13. Free RNA and 
R2-3-4h complex are shown in lanes 1 and 2. Competition assays were performed with the following RNAs: loop LA (lanes 1 1-13), loop LB (lanes 14-16), 
and loop LC (lanes 17-19). (C) Similar competition experiments performed with individual substructures HI 2 and HI 3 from the SRA RNA. Competition 
assays were performed with H12 RNA (lanes 7-10) and HI 3 RNA (lanes 1 1-14). (D) Competition experiments performed with various unrelated RNAs 
showing the importance of a stem region for the competitive capacity of the RNA. Competition assays were performed with cold H7 RNA (lanes 7-10), 
unrelated loop RNA (UL, lanes 1 1-14), and unrelated stem-loop RNA (USL, lanes 1 5-1 8). Concentrations of the cold RNAs were added in molar excess as 
indicated in each panel. Positive and negative controls for efficient competitions are identical in panels B, C, and D (labelled H12-H13 or A15, respectively). 
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tein containing multi-RRM, increasing the already broad 
repertoire of nucleic acid recognition modes of these do- 
mains and underlined the preference of this protein for UG- 
containing structured stem loops. 

Although the exact function of the SHARP protein is still 
the subject of intense research, we would like to speculate 
that the key to understanding its various reported interact- 
ing partners lies in the highly atypical nucleic acid bind- 
ing properties presently observed (51,52). SHARP is a pro- 
tein scaffold that is found in various transcriptional reg- 
ulatory complexes acting around the polymerase holoen- 
zyme. Moreover, long ncRNAs have been proposed to 
mimic an open promoter structure to regulate polymerase 
II-mediated transcription (53). We propose that the SRA 
RNA may mimic an active transcription unit, which would 
explain its capacity to recruit SHARP and possibly could 
help understand the many functions attributed to this long 
ncRNA. 
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